Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
BackgroundThe advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. MethodsOne recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. ResultsiMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. ConclusionThe incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (https://github.com/EESI/Incremental-Protein-Search; DOI:10.5281/zenodo.14675319).more » « lessFree, publicly-accessible full text available April 28, 2026
-
Free, publicly-accessible full text available April 21, 2026
-
Free, publicly-accessible full text available November 22, 2025
-
Objectives We aim to estimate geographic variability in total numbers of infections and infection fatality ratios (IFR; the number of deaths caused by an infection per 1,000 infected people) when the availability and quality of data on disease burden are limited during an epidemic. Methods We develop a noncentral hypergeometric framework that accounts for differential probabilities of positive tests and reflects the fact that symptomatic people are more likely to seek testing. We demonstrate the robustness, accuracy, and precision of this framework, and apply it to the United States (U.S.) COVID-19 pandemic to estimate county-level SARS-CoV-2 IFRs. Results The estimators for the numbers of infections and IFRs showed high accuracy and precision; for instance, when applied to simulated validation data sets, across counties, Pearson correlation coefficients between estimator means and true values were 0.996 and 0.928, respectively, and they showed strong robustness to model misspecification. Applying the county-level estimators to the real, unsimulated COVID-19 data spanning April 1, 2020 to September 30, 2020 from across the U.S., we found that IFRs varied from 0 to 44.69, with a standard deviation of 3.55 and a median of 2.14. Conclusions The proposed estimation framework can be used to identify geographic variation in IFRs across settings.more » « less
-
Synopsis New biophysical theory and electronic databases raise the prospect of deriving fundamental rules of life, a conceptual framework for how the structures and functions of molecules, cells, and individual organisms give rise to emergent patterns and processes of ecology, evolution, and biodiversity. This framework is very general, applying across taxa of animals from 10–10 g protists to 108 g whales, and across environments from deserts and abyssal depths to rain forests and coral reefs. It has several hallmarks: (1) Energy is the ultimate limiting resource for organisms and the currency of biological fitness. (2) Most organisms are nearly equally fit, because in each generation at steady state they transfer an equal quantity of energy (˜22.4 kJ/g) and biomass (˜1 g/g) to surviving offspring. This is the equal fitness paradigm (EFP). (3) The enormous diversity of life histories is due largely to variation in metabolic rates (e.g., energy uptake and expenditure via assimilation, respiration, and production) and biological times (e.g., generation time). As in standard allometric and metabolic theory, most physiological and life history traits scale approximately as quarter-power functions of body mass, m (rates as ∼m–1/4 and times as ∼m1/4), and as exponential functions of temperature. (4) Time is the fourth dimension of life. Generation time is the pace of life. (5) There is, however, considerable variation not accounted for by the above scalings and existing theories. Much of this “unexplained” variation is due to natural selection on life history traits to adapt the biological times of generations to the clock times of geochronological environmental cycles. (6) Most work on biological scaling and metabolic ecology has focused on respiration rate. The emerging synthesis applies conceptual foundations of energetics and the EFP to shift the focus to production rate and generation time.more » « less
-
Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.more » « less
-
Chain-transfer ring-opening metathesis polymerization (CT-ROMP) previously provided a route to carboxytelechelic polyethylene (PE) of controlled molecular weight; however, the incorporation of oligomeric PE into segmented copolymers remains unexplored. Herein, CT-ROMP afforded carboxytelechelic polycyclooctene segments, and subsequent reduction generated well-defined carboxytelechelic PE with M n = 3900 g mol −1 . Solvent-free melt polycondensation of neopentyl glycol and adipic acid with varying wt% telechelic PE oligomers yielded mechanically durable segmented copolyesters. The thermal and thermomechanical properties of the segmented copolyesters correlated with PE segment content, and high PE content copolymers exhibited remarkably similar morphologies and thermomechanical performance to conventional HDPE. The segmented copolyesters displayed advantageous physical properties while introducing susceptibility to chemo- and bio-catalytic depolymerization through periodic ester linkages, thus providing valuable fundamental understanding of an alternative route to HDPE.more » « less
-
Quantum algorithms are touted as a way around some classically intractable problems such as the simulation of quantum mechanics. At the end of all quantum algorithms is a quantum measurement whereby classical data is extracted and utilized. In fact, many of the modern hybrid-classical approaches are essentially quantum measurements of states with short quantum circuit descriptions. Here, we compare and examine three methods of extracting the time-dependent one-particle probability density from a quantum simulation: directZ-measurement, Bayesian phase estimation, and harmonic inversion. We have tested these methods in the context of the potential inversion problem of time-dependent density functional theory. Our test results suggest that direct measurement is the preferable method. We also highlight areas where the other two methods may be useful and report on tests using Rigetti's quantum virtual device. This study provides a starting point for imminent applications of quantum computing.more » « less
An official website of the United States government
